perm filename SLR1.JP[UP,DOC]1 blob sn#248837 filedate 1976-11-26 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00007 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002				How to use the SLR1 parser generator
C00005 00003	 			A sample input grammar
C00009 00004				SLR1's output
C00013 00005				Semantics routines
C00016 00006				SEM : The semantic routines
C00022 00007				The default scanner
C00036 ENDMK
C⊗;
			How to use the SLR1 parser generator

The following is a  sample run of SLR1.  Notice that there are  facilities
for checking the grammar before generating  the parser. If you enter E  on
the grammar there is  no way as of  yet to return from  E directly to  the
program. The input grammar appears in the following page.

.r slr1
Production file? gram
Productions to be Listed? --Terminate With <crlf>: <assignment>
The Productions are:

<ASSIGNMENT>--><LEFTSIDE> := <AE> 
Productions to be Listed? --Terminate With <crlf>: <leftside>
The Productions are:

<LEFTSIDE>-->ID 
Productions to be Listed? --Terminate With <crlf>: 
Do you want to see the symbol table? y
	ID		=   1
	NUM		=   2
	REAL		=   3
	STRING		=   4
	:=		=   5
	+		=   6
	-		=   7
	*		=   8
	/		=   9
	↑		=   10
	(		=   11
	)		=   12
Want to edit grammar? 
Start Symbol? <assignment>
Generated 29 production states.

Single reduction eliminated: 3
Unoptimized number of states: 30
optimized number of states: 27

Do you want a dump of the relational graph? n
File Name for Generated Parser? arith.sai
Generating parsing actions...  done!

End of SAIL execution

↑C
.

Notice that negative answers  to questions can be  given by typing  CR-LF.
The relational graph of  SLR1 items is always  output to file DUMP.PDL  in
the alias directory. All other files  are defined by the luser. If  lookup
for a file  fails the  loser will be  reprompted. SLR1  accepts full  file
names.
 			A sample input grammar

The following is an SLR1 grammar for arithmetic expressions (for further
info on SLR1 grammars see Aho & Ullman Vol II), in the input format required
by the parser generator.


	  ε ID NUM REAL STRING $
	  6 $<ASSIGNMENT>--> <LEFTSIDE> := <AE> $
	  <AE>--><SAE> $
	  <SAE>--><TERM> $
	  0 $<SAE>--><SAE> + <TERM> $
	  1 $<SAE>--><SAE> - <TERM> $
	  <TERM>--> <FACTOR> $
	  <TERM>--> <FACTOR1> $
	  3 $<TERM>--><TERM> * <FACTOR> $
	  4 $<TERM>--><TERM> / <FACTOR> $
	  <FACTOR>--> <PRIMARY> $
	  <FACTOR1>--> <PRIMARY> $
	  2 $<FACTOR>--><FACTOR> ↑ <FACTOR1> $
	  5 $<PRIMARY>-->ID $
	  5 $<LEFTSIDE>--> ID $
	  8 $<PRIMARY>-->NUM $
	  7 $<PRIMARY>-->REAL $
	  9 $<PRIMARY>-->STRING  $
	  $<PRIMARY>-->( <AE> )$
	    
The format for the input grammar should be apparent from the above.
	Non-terminal   ≡   <..garbagechars..>   (exception: <> is a terminal!)
			   Also notice that chars between brackets cannot be
			   brackets themselves! BEWARE!!
        Terminal       ≡   ..any sequence of chars which is not a nonterminal..
	Production     ≡   { integer $ } non-terminal --> ..rhs.. $
Notice that  the  production  prefix  is  optional.   If  it  appears,  it
associates the  production with  a  semantic routine  to be  applied  when
production succeeds (see section on  semantics). Notice that the ending  $
is necessary to signal the end of  a production (yes, you do have to  type
--> !!!).  The first  line of  a grammar  SHOULD always  appear. It  binds
distinguished terminals in  the grammar  with the  default scanner  syntax
classes. The binding is positional with the first terminal appearing being
bound to syntactic class 0. Notice  that this only applies to the  default
scanner. For the  daring few  who provide  their own  scanner a  different
syntax class convention may apply. Nevertheless SLR1 is equipped with this
binding feature and it must be used in any luser provided scanner.

The current syntax class assignment (and the associated token syntax) is
	
	      0    →    epsilon (null terminal)
	      1    →    identifier (alphanumeric)
	      2    →    integer
	      3    →    real (nnnn.mmmmmEsll where s = + or -
						   nnnn cannot be null
						   ll is the exponent)
	      4    →    string ('...any chars...')
			to include a ' in a string use ''.

Comments can be typed in as "comment-text" and interspersed freely in the source
between syntactic elements (not within!).
			SLR1's output
 
The output of  SLR1 comes in  two flavors. One  is the parser,  relatively
efficient although not optimized (an optimization effort is done to remove
single productions to cut down on the number of states). The other is  the
optional dump of the  relational graph and LR  items. This is very  useful
specially for checking for SLR1 bugs,  and that the input grammar  satisfy
the SLR1 test and not be ambiguous.  The one YOU are really interested  in
is the parser. The parser uses three load modules. Two of them have to  be
provided always by the user, and the third is provided by the system. This
one is  the scanner.  Notice that  you  can also  provide the  scanner  by
suitable modification of the "require" statement in the generated  parser.
The scanner is slr1a.rel[1,3], and the semantic processors (ie the other 2
files) are proces and sem in the luser's alias directory. THEY MUST ALWAYS
BE PROVIDED (...unless you hack your generated parser not to need them).
 
 
			Using your new parser
 
Compile and load it. When  started it will prompt you  with a *. You  have
two options here. Either  type in a  file name, and  parser will read  the
program to be parsed from the corresponding  file or type a CR-LF and  the
parser will accept input from  the tty. EOF is  signalled by either ↑Z  in
ttys, ⊗lf in DDs or edit-null-edit-lf in Datamedias.
 
 
			Who to complain to
 
If you have any suggestions, comments, bugs or want some more info see
either JP or JLS.

			Error detection and recovery

No attempt  to  deal with  automatic  generation of  error  detection  and
recovery routines  has  been  made.  Nevertheless,  it  should  be  rather
straight-forward to add these into the generated parser. Some ideas can be
found in Aho and Ullman and in the proceedings of the last ACM  conference
(1976) at Houston.  At present the only facility provided is the  flagging
of errors and the message  SYNTAX ERROR ENCOUNTERED (MLISP2 lusers:  don't
you love this message?). Suggestions welcome.
			Semantics routines

This section gives a brief overview on how to write the semantic routines
for your new parser. The  following is a listing  of files SEM and  PROCES
for the sample grammar given in  the preceding section. Notice the  needed
internal and external variables needed for communication with the rest  of
the system.  SEM routines  are invoked as reductions  are done. PROCES  is
only invoked  after the  whole input  string has  been recognized  by  the
parser, and as  such could  be used for  generating code  or loading  other
passes for your translator, or could conceivably not be needed at all, for
instance in an interpreter. The PROCES given below is just the skeleton of
what you could write, since it is just a dummy program for the  assignment
grammar  given  before.  Comments  to  the  global  data  structures   and
interfaces are given after the listings.

			PROCES.SAI: The postprocessor skeleton

entry;
begin "PROCESS"
  define cl = "('15&'12)",
	 #  = "COMMENT";

  external integer array
	polstack[1:100]; # The parse stack.;
  external integer
	polpointer; # The parse stack pointer;
  external procedure printpolstack; # prints the parse stack;
  internal procedure process; # called from the generated parser;
  begin
    printpolstack; # In this case just prints the parse stack;
    polpointer←0; # and resets the stack pointer;
  end;

end "PROCESS";
			SEM : The semantic routines

Again the following is a sample skeleton for the semantic routines for
the assignment grammar given before. Notice that no attempt is being made
here of interpreting or translating the expressions.

entry;
begin "SEMANTICS"
  define
	cl = "('15&'12)",
        #  = "COMMENT";
  external integer procedure symtabentry(string str);
  external integer index,intval;
  external real realval; # scanned real is stored here!;
  external string strval; # scanned string store here by scanner.;
  external string array name[0:310]; # symbol table;
  internal record_pointer (any_class) array polstack [0:100];
  internal integer polpointer;

  #	The following are the parse stack records. Tag = 1 => Id
  #	Tag = 2 => Integer, Tag = 3 => Real, Tag = 4 => String
  #	Realentry is for type 3. Intentry for the others;

  record_class intentry  (integer tag, val);
  record_class realentry (integer tag; real val);
  record_class strentry  (integer tag; string val);

# 	Initialize the polish stack.;
  internal simple procedure initsem;
  begin
    polpointer ← 0;
  end;

#	This procedure prints the current state of the parsestack;
  internal procedure printpolstack;
  begin "PRINTPOLSTACK"
    integer jj;
    outstr(cl&"Parse Stack:"&cl);
    for jj←polpointer step -1 until 1 do
        begin
	  print(jj,": ");
	  case (intentry:tag[polstack[jj]]-1) of
	  begin
	      print(name[intentry:val[polstack[jj]]],cl);
	      print(intentry:val[polstack[jj]],cl);
	      print(realentry:val[polstack[jj]],cl);
	      print("""",strentry:val[polstack[jj]],"""",cl)
	  end;
	end;
  end "PRINTPOLSTACK";

internal procedure semantics(integer rule);
begin "SEM"
  integer idindex; "here is what is pushed when an id or op is going to be dumped"

  procedure pushpol(integer tag1(1));
  begin "PUSHPOL"
    record_pointer (any_class) p;
    if tag1 = 3 then "push a real"
       begin
     	 p ← new_record(realentry);
   	 realentry:tag[p] ← tag1;
	 realentry:val[p] ← realval;
       end
    else 
    if tag1 = 4 then "push a string"
       begin
	 p ← new_record(strentry);
	 strentry:tag[p] ← tag1;
	 strentry:val[p] ← strval;
       end
    else begin
	   p ← new_record(intentry);
	   intentry:tag[p] ← tag1;
	   case (tag1-1) of 
	        begin
		  intentry:val[p] ← idindex;
		  intentry:val[p] ← intval
		end;
	 end;
    polstack[polpointer ← polpointer+1] ← p
  end "PUSHPOL";

# The number you give as a prefix of a production is really an index into
  this case statement. Here we are only pusshing objects onto the stack;

  case rule of
  begin

    begin "+"
      idindex ← symtabentry("+");
      pushpol(1);
    end "+";

    begin "-"
      idindex ← symtabentry("-");
      pushpol(1);
    end "-";

    begin "↑"
      idindex ← symtabentry("↑");
      pushpol(1);
    end "↑";

    begin "*"
      idindex ← symtabentry("*");
      pushpol(1);
    end "*";

    begin "/"
      idindex ← symtabentry("/");
      pushpol(1);
    end "/";

    begin "id"
      idindex ← index;
      pushpol(1);
    end "id";
    
    begin ":="
      idindex ← symtabentry(":=");
      pushpol(1);
    end ":=";

    begin "REAL"
      pushpol(3)
    end "REAL";

    begin "NUM"
      pushpol(2)
    end "NUM";

    begin "STRING"
      pushpol(4);
    end "STRING"

  end;
  return;
end "SEM";
end "SEMANTICS";
			The default scanner

The following is the default scanner for generated parsers.
entry;
begin "ACCEPT"
  define
	crlf = "('15&'12)",
	# = "comment",
	unknown = "999",
	symtabsize = "311",
	symtabsizem1 = "(symtabsize-1)",
	hash(x) = "(((cvasc(0&x)) mod symtabsizem1)+1)";

#	Notice that polstack has the associated poltag stack in which
#	stack entries are tagged;
           
external record_pointer (any_class) array polstack[1:100];
external integer polpointer,parse,parseflag;

#	Notice that token and itval are the interface with the slr1 parser.;
external integer count, token;
external simple procedure push(integer astate);

internal integer inchan, brchar, flag, index;
#	our string space;

# The following three vars hold scanned values;
internal real realval;
internal integer intval;
internal string strval;	

internal string bstr, inputstr;
internal real array value[0:symtabsizem1];
internal integer array type[0:symtabsizem1];
internal string array name[0:symtabsizem1];

real	int;
integer	jj, indx;
string	t;

forward internal integer procedure symtabentry(string x);

# 	Some useful procedures;


  simple procedure mylop(reference string s);
#	Eliminate the first char of s;
  begin
    integer i;
    i ← lop(s);
  end;

  simple procedure deblank(reference string s;integer i);
#	Eliminate all i's from the begining of s;
  begin "DEBLANK"

    while s = i do mylop(s);

  end "DEBLANK";


internal integer procedure symtabentry(string x);
begin "SYMTABENTRY"
  integer i,j,k;
  k ← 0;
  j ← (i ← hash(x));
  while ¬(length(name[i])=0 ∨ equ(name[i],x)) do
     begin
       if equ(name[i]," ") ∧ k=0 then k ← i;
       i ← (i+j) mod symtabsize;
     end;
  if (i=0) ∧ (k=0) then outstr("Symtab Full"&crlf);
  if length(name[i])=0 then
     begin
       if k then i ← k;
       name[i] ← x;
       type[i] ← unknown;
       value[i] ← '400000000001;
     end;
  return(i);
end "SYMTABENTRY";

internal procedure initsymtab(string str);
begin "INITSYMTAB"
  string field1, field2;
  integer inch, flag, flag1, symcount, brchar;

  procedure error;
  begin
    outstr("Can't happen -- Strange symbol table!"&crlf);
    call(0,"Exit");
  end;

  setbreak(3,"$",'11&'40,"isk");
  setbreak(4,"0123456789"&'11&'40,'11&'40,"xsk");
  setbreak(5,'11,null,"insk");
  symcount ← 0;
  do begin
       field1 ← scan(str,3,brchar);
       if ¬(brchar = "$") then error;
       field2 ← scan(str,4,brchar);
       if ¬(brchar = '15) then error;
       index ← symtabentry(field1);
       type[index] ← cvd(field2);
       name[index] ← field1;
       symcount ← symcount + 1;
       scan(str,5,brchar);
     end
  until str = null;
end "INITSYMTAB";

procedure scanstring;
begin "scanstring"
   string s1;
   integer b;
#  skip first char which must be ';
   mylop(bstr);
   s1 ← null;
   while true do
     begin
       s1 ← s1&scan(bstr,7,b);
       if (b='15) ∨ (b=null) then
	  " Notice horrendous fixup for reading from TTY instead of DSK:
	    This is needed since activation char might have thrown in line editor
	    and line will not end in CRLF!!"
	  begin
	    s1 ← s1&crlf;
	    # get a new line;
	    inputstr ← bstr ← input(parse,6);
	    if parseflag then
	       begin
		 outstr("Unexpected EOF. String delimiter missing."&crlf);
		 done;
	       end;
	  end
       else if bstr = "'" ∧ b = "'" then
	       begin s1 ← s1&"'"; mylop(bstr) end
	    else done;
    end;
    token ← 4;
    strval ← s1;
end "scanstring";

procedure scanquotes;
begin "scanquotes"
  integer b;
#  skip first char which must be ";
#  the last char must also be ";
  mylop(bstr);
  while true do
     begin
       scan(bstr,9,b);
       if (b='15) ∨ (b=null) then "notice extremely unclean code for TTY hack"
	  begin
	    # get a new line;
	    inputstr ← bstr ← input(parse,6);
	    if parseflag then
	       begin
		 outstr("Unexpected EOF. Comment delimiter missing."&crlf);
		 done;
	       end;
	  end
       else done;
    end;
end "scanquotes";

procedure scancomment;
begin "scancomment"
#  skip first char which must be ".....comment.....";
#  last char must be sc;
  bstr ← bstr[8 to ∞];
  while true do
     begin
       while (bstr≠null) ∧ (bstr≠'15) ∧ (bstr≠";") do mylop(bstr);
       if (bstr='15) ∨ (bstr=null) then "notice extremely unclean code for TTY hack"
	  begin
	    # get a new line;
	    inputstr ← bstr ← input(parse,6);
	    if parseflag then
	       begin
		 outstr("Unexpected EOF. Comment delimiter missing."&crlf);
		 done;
	       end;
	  end
       else begin mylop(bstr); done; end;
    end;
end "scancomment";

procedure scannum;
begin "scannum"
  string s1;
  real r;
  integer b,f,l;
  

  simple procedure error (string str);
#	Send an error msg;
  begin 
    outstr(str&crlf&"  input string : "&s1)
  end;

  s1 ← scan(bstr,8,b);

  if length(s1) > 11 then begin
		            error("Number too long. Truncated to 11 digits.");
			    s1 ← s1[1 to 11];
			  end;

  l ← cvd(s1);
  if bstr = "." then "Could be a real!"
     begin
    	mylop(bstr);
	if bstr = "." then "...sorry! it was an integer."
	   begin
	     bstr ← "."&bstr; "Yea kid. Put it back."
	     token ← 2;
	     intval ← l;
	     return;
	   end;
	"OK. Now we really know its a real"

        r ← l;
	if ("0" ≤ bstr ≤ "9") then 
	   "here comes the fractional part.."
	   begin  
	     s1 ← scan(bstr,8,b);
	     if length(s1) > 11 then
		begin
		  error("Fractional part too long. Truncated to 11 digits.");
		  s1 ← s1[1 to 11];
		end;
	     r ← r + cvd(s1)/(10↑length(s1));
           end;

        "Look for an exponent in E notation"
        if bstr = "E" then
           begin
  	    mylop(bstr);
	    if (bstr = "+") ∨ (bstr = "-") ∨ ("0" ≤ bstr ≤ "9") then
		"sure, there is an exponent!"
	       begin
	         f ← (bstr = "-");
		 if f ∨ (bstr="+") then mylop(bstr);
		 s1 ← scan(bstr,8,b);
		 if abs(log(r) + cvd(s1) * (if f then -1 else 1)) > 35 then
		    begin
		      error("Exponent too long. Eliminated..");
		      s1 ← null;
   		    end
                 else
		   r ← r*10↑(cvd(s1)*(if f then -1 else 1));
	       end
	     else bstr ← "E" & bstr;
          end;
       token ← 3; realval ← r;
     end
  else begin
	 token ← 2; intval ← l;
       end;
end "scannum";

procedure scanidop (string t);
#	The identifier and operator scanning routine;
#	First param is id or op string;
begin "scanidop"
  indx ← symtabentry(t);
  if type[indx]=unknown then
     begin
       token ← type[indx] ← 1;
       index ← indx;
     end
  else begin
	 if type[indx]=1 then index ← indx;
	 token ← type[indx];
       end;
end "scanidop";

internal procedure shift(integer i);
#	The shift procedure. Advances token stream by one token. Leaves in
#	token the type (0=EOF,1=ID,2=INT,3=REAL,4=STRING,>4= reserved word)
#	and in INTVAL, REALVAL, STRINGVAL the corresponding scanned value;
begin "shift"
  push(i);
  token ← 0;
  while ¬ parseflag do
     begin
       while ¬((bstr = '15) ∨ (bstr = null)) do
	 begin
	    while (bstr='40) ∨ (bstr='11) do mylop(bstr);
	    " Now try to see if a number follows "
	    if ("0" ≤ bstr ≤ "9") then
	       begin scannum; return; end
	    else if bstr = "'" then begin scanstring; return end
                 else if bstr = """" then begin scanquotes; continue end
		      else if equ(bstr[1 to 7],"COMMENT") then
			      begin scancomment; continue; end
			   else begin
				  if ("A" ≤ bstr ≤ "Z") then 
				     scanidop(scan(bstr,2,brchar))
				  else begin
					 t ← lop(bstr);
					 # bletch!;
					 if ((t=":" ∨ t="<" ∨ t=">") ∧ bstr="=") ∨
					    (t = ":" ∧ bstr="=") ∨
					    (t = "(" ∧ (bstr = "." ∨ bstr = ":"))∨
					    (t = ")" ∧ (bstr = "." ∨ bstr = ":"))∨
					    (t = "." ∧ bstr=".")
					   then t ← t&lop(bstr);
					 scanidop(t);
				       end;
				  return;
				end;
 	 end;
       inputstr ← bstr ← input(parse,6);
       if parseflag then begin token ← 0; return end;
     end;
end "shift";

internal procedure breakinit;
begin
  setbreak(2,"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",null,"xrk");
  setbreak(6,'12,'13&'14,"inak");
  setbreak(7,"'"&'15,null,"insk");
  setbreak(8,"0123456789",null,"xrk");
  setbreak(9,""""&'15,null,"insk");
  setbreak(10,"=.",null,"xrk");
end;

end "ACCEPT";